CS 269 : Machine Learning Theory
نویسندگان
چکیده
The idea behind boosting is to construct an accurate hypothesis from a set of hypotheses that are each guaranteed to perform slightly better than random guessing on particular distributions of data. This idea originally arose from attempts to prove the robustness of the PAC model. By robustness, we mean the notion that slight alterations to the model definitions should not result in dramatically different results, such as the ability to learn different classes of functions. The question of whether weak learning implies strong learning was first posed in the late 80s by Kearns and Valiant. The first boosting algorithm was developed by Rob Schapire in order to answer this question. This paved the way for the immensely popular AdaBoost algorithm, which was developed by Freund and Schapire a couple of years later. To formalize this discussion we will first review the basic definition of PAC learnability, which we now refer to as strong PAC learnability to distinguish it from the weak version. Note that this is the standard definition that we have discussed in class previously.
منابع مشابه
CS 269 : Machine Learning Theory Lecture 4 : Infinite Function Classes
Before stating Hoeffding’s Inequality, we recall two intermediate results that we will use in order to prove it. One is Markov’s Inequality and other is Hoeffding’s Lemma. (Note that in class we did not cover Hoeffding’s Lemma, and only gave a brief outline of the Chernoff Bounding Techniques and how they are used to prove Hoeffding’s Inequality. Here we give a full proof of Hoeffding’s Inequal...
متن کاملCS 269 : Machine Learning Theory Lecture 14 : Generalization Error of Adaboost
In this lecture we will continue our discussion of the Adaboost algorithm and derive a bound on the generalization error. We saw last time that the training error decreases exponentially with respect to the number of rounds T . However, we also want to see the performance of this algorithm on new test data. Today we will show why the Adaboost algorithm generalizes so well and why it avoids over...
متن کاملCS 269 : Machine Learning Theory Lecture 16 : SVMs and Kernels
We previously showed that the solution to the primal problem is equivalent to the solution to the dual problem if they satisfy the following primal-dual equivalence conditions. First, we need a convex objective function and in our case, it is 1 2 ||~ w||2. Second, we need convex inequality constraints gi, which are 1−yi(~ w · ~xi + b) for i = 1, ...,m. The last condition states that for each in...
متن کاملThe Cauchy–Schwarz divergence and Parzen windowing: Connections to graph theory and Mercer kernels
This paper contributes a tutorial level discussion of some interesting properties of the recent Cauchy–Schwarz (CS) divergence measure between probability density functions. This measure brings together elements from several different machine learning fields, namely information theory, graph theory and Mercer kernel and spectral theory. These connections are revealed when estimating the CS dive...
متن کاملSituated Learning in Computer Science Education
Sociocultural theories of learning such as Wenger and Lave’s situated learning have been suggested as alternatives to cognitive theories of learning like constructivism. This article examines situated learning within the context of computer science (CS) education. Situated learning accurately describes some CS communities like open-source software development, but it is not applicable to other ...
متن کامل